Current multi-document summarization systems can successfully extract summarysentences, however with many limitations including: low coverage, inaccurateextraction to important sentences, redundancy and poor coherence among theselected sentences. The present study introduces a new concept of centroidapproach and reports new techniques for extracting summary sentences formulti-document. In both techniques keyphrases are used to weigh sentences anddocuments. The first summarization technique (Sen-Rich) prefers maximumrichness sentences. While the second (Doc-Rich), prefers sentences fromcentroid document. To demonstrate the new summarization system application toextract summaries of Arabic documents we performed two experiments. First, weapplied Rouge measure to compare the new techniques among systems presented atTAC2011. The results show that Sen-Rich outperformed all systems in ROUGE-S.Second, the system was applied to summarize multi-topic documents. Using humanevaluators, the results show that Doc-Rich is the superior, where summarysentences characterized by extra coverage and more cohesion.
展开▼